Add CloudWatch SEARCH() expression support for dynamic metric alarms#143
Add CloudWatch SEARCH() expression support for dynamic metric alarms#143jaredbrook wants to merge 5 commits intomasterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for CloudWatch SEARCH()-based dynamic metric alarms so alarms can target changing resources (e.g., replaced Auto Scaling Groups) without requiring fixed dimensions.
Changes:
- Extend alarm models to carry
search_expression/search_aggregation, plus interpolation support for${Resource::...}inside search expressions. - Emit CloudFormation alarms using
Metrics(metric math) whensearch_expressionis present, and relax validation to not requireMetricName/Namespacein that case. - Add documentation and RSpec coverage for search-expression alarms; bump gem version/dependencies.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| spec/search_expression_spec.rb | New specs covering model fields, template emission, interpolation, and compile-time validation behavior |
| lib/cfnguardian/models/alarm.rb | Add search_expression / search_aggregation fields to alarms |
| lib/cfnguardian/resources/base.rb | Interpolate ${Resource::...} variables inside search expressions |
| lib/cfnguardian/stacks/resources.rb | Emit AWS::CloudWatch::Alarm using Metrics/expressions when search_expression is set |
| lib/cfnguardian/compile.rb | Skip metric_name/namespace validation for search-expression alarms; validate empty SearchExpression |
| docs/search_expressions.md | New documentation describing configuration and limitations |
| docs/variables.md | Document variable support inside search expressions |
| docs/overview.md | Add Search Expression Alarms to docs TOC |
| lib/cfnguardian/version.rb | Bump version to 0.13.0 |
| Gemfile.lock | Lockfile updates for new version and dependency resolutions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,215 @@ | |||
| require 'spec_helper' | |||
| require 'json' | |||
| require 'yaml' | |||
There was a problem hiding this comment.
This spec uses FileUtils.mkdir_p/FileUtils.rm_f later but never require 'fileutils'. Without it, the spec will raise NameError: uninitialized constant FileUtils in a clean test environment. Add the missing require (or avoid FileUtils usage).
| require 'yaml' | |
| require 'yaml' | |
| require 'fileutils' |
| expect(search_alarms.length).to eq(1) | ||
| expect(search_alarms.first.search_expression).to include('SEARCH') | ||
| ensure | ||
| FileUtils.rm_f(fixture) |
There was a problem hiding this comment.
The spec creates spec/fixtures via FileUtils.mkdir_p(config_dir) but only removes the YAML file in the ensure block. This leaves the fixtures directory behind after the test run. Consider removing the directory as well (when empty) or using a temp directory (e.g., Dir.mktmpdir) to avoid polluting the working tree.
| FileUtils.rm_f(fixture) | |
| FileUtils.rm_f(fixture) | |
| FileUtils.rmdir(config_dir) if Dir.exist?(config_dir) && Dir.children(config_dir).empty? |
| # String interpolation for search expressions | ||
| if alarm.search_expression.is_a?(String) | ||
| alarm.search_expression = alarm.search_expression.gsub(/\${Resource::([A-Za-z]+)}/) do | ||
| resource_key = $1 | ||
| if @resource.has_key?(resource_key) | ||
| logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'" | ||
| alarm.dimensions[k] = @resource[resource_key] | ||
| logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}" | ||
| @resource[resource_key] | ||
| else | ||
| "${Resource::#{resource_key}}" | ||
| end |
There was a problem hiding this comment.
Search expression interpolation only matches ${Resource::...} keys made of letters (/\${Resource::([A-Za-z]+)}/). This is more restrictive than the dimension-variable interpolation above and contradicts the docs that say “any key from the resource definition” (keys may contain digits/underscores). Consider broadening the regex (e.g., allow [A-Za-z0-9_]+), and prefer Regexp.last_match(1) over $1 for clarity.
| if resource.search_expression | ||
| if resource.search_expression.empty? | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an empty SearchExpression." | ||
| end |
There was a problem hiding this comment.
resource.search_expression.empty? assumes search_expression is a String. If a user misconfigures YAML (e.g., SearchExpression: []), this will raise a NoMethodError during validation instead of producing a helpful validation message. Consider validating search_expression.is_a?(String) and treating blank/whitespace-only strings as invalid (e.g., strip.empty?) to keep validation robust.
| def add_search_expression_alarm(alarm, actions) | ||
| search_expr = alarm.search_expression | ||
| aggregation = alarm.search_aggregation || 'MAX' | ||
|
|
||
| @template.declare do | ||
| CloudWatch_Alarm("#{alarm.resource_hash}#{alarm.group}#{alarm.name.gsub(/[^0-9a-zA-Z]/i, '')}#{alarm.type}"[0..255]) do | ||
| ActionsEnabled true | ||
| AlarmDescription "Guardian alarm #{alarm.name} for the resource #{alarm.resource_id} in alarm group #{alarm.group}" | ||
| AlarmName CfnGuardian::CloudWatch.get_alarm_name(alarm) | ||
| ComparisonOperator alarm.comparison_operator | ||
| EvaluationPeriods alarm.evaluation_periods | ||
| Threshold alarm.threshold | ||
| AlarmActions actions | ||
| OKActions actions unless alarm.ok_action_disabled | ||
| TreatMissingData alarm.treat_missing_data unless alarm.treat_missing_data.nil? | ||
| DatapointsToAlarm alarm.datapoints_to_alarm unless alarm.datapoints_to_alarm.nil? | ||
| Metrics [ | ||
| { | ||
| Id: 'search_expression', | ||
| Expression: search_expr, | ||
| ReturnData: false | ||
| }, | ||
| { | ||
| Id: 'aggregate', | ||
| Expression: "#{aggregation}(search_expression)", | ||
| ReturnData: true | ||
| } |
There was a problem hiding this comment.
search_aggregation is inserted directly into the metric-math expression ("#{aggregation}(search_expression)") without validation/normalization. A typo or lowercase value will generate an invalid CloudWatch expression and fail at deploy time. Consider upcasing the value and validating it against the supported set (MAX/MIN/AVG/SUM), surfacing a compile-time validation error when invalid.
| @@ -57,6 +65,38 @@ def add_alarm(alarm) | |||
| end | |||
| end | |||
| end | |||
|
|
|||
| def add_search_expression_alarm(alarm, actions) | |||
| search_expr = alarm.search_expression | |||
| aggregation = alarm.search_aggregation || 'MAX' | |||
|
|
|||
| @template.declare do | |||
| CloudWatch_Alarm("#{alarm.resource_hash}#{alarm.group}#{alarm.name.gsub(/[^0-9a-zA-Z]/i, '')}#{alarm.type}"[0..255]) do | |||
| ActionsEnabled true | |||
| AlarmDescription "Guardian alarm #{alarm.name} for the resource #{alarm.resource_id} in alarm group #{alarm.group}" | |||
| AlarmName CfnGuardian::CloudWatch.get_alarm_name(alarm) | |||
| ComparisonOperator alarm.comparison_operator | |||
| EvaluationPeriods alarm.evaluation_periods | |||
| Threshold alarm.threshold | |||
| AlarmActions actions | |||
| OKActions actions unless alarm.ok_action_disabled | |||
| TreatMissingData alarm.treat_missing_data unless alarm.treat_missing_data.nil? | |||
| DatapointsToAlarm alarm.datapoints_to_alarm unless alarm.datapoints_to_alarm.nil? | |||
| Metrics [ | |||
| { | |||
| Id: 'search_expression', | |||
| Expression: search_expr, | |||
| ReturnData: false | |||
| }, | |||
| { | |||
| Id: 'aggregate', | |||
| Expression: "#{aggregation}(search_expression)", | |||
| ReturnData: true | |||
| } | |||
| ] | |||
| end | |||
| end | |||
| end | |||
There was a problem hiding this comment.
add_search_expression_alarm duplicates most of the CloudWatch_Alarm property wiring from add_standard_alarm, which increases the chance of future drift (e.g., tags/unit/low-sample settings). Consider extracting the common alarm property assignment into a shared helper, and only branching for the mutually-exclusive metric configuration (Dimensions/MetricName/etc vs Metrics).
| Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits `MetricDataQueries` with: | ||
|
|
||
| 1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name | ||
| 2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation | ||
|
|
||
| ## Configuration | ||
|
|
||
| Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with `MetricDataQueries`. |
There was a problem hiding this comment.
This doc says Guardian “emits MetricDataQueries”, but the CloudFormation AWS::CloudWatch::Alarm property name is Metrics (a list of MetricDataQuery objects). Consider adjusting wording to avoid confusion for readers cross-referencing the CFN spec.
| Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits `MetricDataQueries` with: | |
| 1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name | |
| 2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation | |
| ## Configuration | |
| Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with `MetricDataQueries`. | |
| Instead of emitting a CloudWatch alarm with fixed `Dimensions`, `MetricName`, `Namespace`, and `Statistic` properties, a search expression alarm emits the CloudFormation `Metrics` property (a list of `MetricDataQuery` objects) with: | |
| 1. A **SEARCH()** expression that dynamically matches metrics by partial or exact name | |
| 2. An **aggregation function** (e.g. `MAX`, `AVG`, `SUM`) that reduces the matched metrics to a single time series for threshold evaluation | |
| ## Configuration | |
| Add `SearchExpression` and optionally `SearchAggregation` to an alarm template. When `SearchExpression` is set, the `Dimensions`, `MetricName`, `Namespace`, `Statistic`, and `Period` properties are not used since CloudWatch treats these as mutually exclusive with the alarm `Metrics` property. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| context 'when search expression alarm has no metric_name or namespace' do | ||
| it 'does not raise validation errors' do | ||
| fixture = File.join(config_dir, 'search_expression_alarms.yaml') | ||
| FileUtils.mkdir_p(config_dir) | ||
| File.write(fixture, { | ||
| 'Resources' => { | ||
| 'AutoScalingGroup' => [{ 'Id' => 'my-app-AsgGroup-abc123' }] | ||
| }, | ||
| 'Templates' => { | ||
| 'AutoScalingGroup' => { | ||
| 'CPUUtilizationHighBase' => { | ||
| 'SearchExpression' => "SEARCH('{AWS/EC2,AutoScalingGroupName} MetricName=\"CPUUtilization\" my-app', 'Minimum', 60)", | ||
| 'SearchAggregation' => 'MAX' | ||
| }, | ||
| 'StatusCheckFailed' => false | ||
| } |
There was a problem hiding this comment.
This example is described as having “no metric_name or namespace”, but it overrides an existing AutoScalingGroup default alarm (CPUUtilizationHighBase) that already sets both fields, so it doesn’t actually exercise the validation change. Consider adding a new alarm (one that doesn’t exist in defaults) with only SearchExpression set, and assert that validation passes and the created alarm has metric_name (and/or namespace) still nil.
| run: | | ||
| gem install rspec | ||
| rspec | ||
| gem install bundler -v 2.4.22 |
There was a problem hiding this comment.
The workflow installs Bundler 2.4.22, but the lockfile was generated with Bundler 2.3.19 (BUNDLED WITH). This often produces warnings and may cause unintended lockfile churn. Consider aligning the workflow Bundler version with the lockfile, or update the lockfile’s BUNDLED WITH to match the workflow version.
| gem install bundler -v 2.4.22 | |
| gem install bundler -v 2.3.19 |
| Metrics [ | ||
| { | ||
| Id: 'search_expression', | ||
| Expression: search_expr, | ||
| ReturnData: false |
There was a problem hiding this comment.
When using the CloudFormation Metrics property (metric math alarms), ThresholdMetricId is required to tell CloudWatch which query ID to evaluate against Threshold. Without it, the generated alarm resource may fail template validation or evaluate an unexpected metric. Set ThresholdMetricId to the ID of the return-data query (likely aggregate).
| logger.debug "overriding alarm #{alarm.name} dimension key '#{k}' with value '#{@resource[resource_key]}'" | ||
| alarm.dimensions[k] = @resource[resource_key] | ||
| logger.debug "interpolating search_expression variable '#{resource_key}' with value '#{@resource[resource_key]}' for alarm #{alarm.name}" | ||
| @resource[resource_key] |
There was a problem hiding this comment.
The gsub replacement block must return a String. If a resource value is numeric/boolean (e.g., a Port), returning it directly will raise TypeError during interpolation. Convert interpolated values to strings (e.g., @resource[resource_key].to_s) before returning from the block.
| @resource[resource_key] | |
| @resource[resource_key].to_s |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if resource.search_expression | ||
| if !resource.search_expression.is_a?(String) || resource.search_expression.strip.empty? | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has an invalid SearchExpression. Must be a non-empty string." | ||
| end | ||
| if resource.search_aggregation | ||
| valid_aggregations = %w(MAX MIN AVG SUM) | ||
| normalized = resource.search_aggregation.to_s.upcase | ||
| if valid_aggregations.include?(normalized) | ||
| resource.search_aggregation = normalized | ||
| else | ||
| @errors << "CfnGuardian::AlarmPropertyError - alarm #{resource.name} for resource #{resource.resource_id} has invalid SearchAggregation '#{resource.search_aggregation}'. Must be one of: #{valid_aggregations.join(', ')}." | ||
| end |
There was a problem hiding this comment.
New validation behavior was added for SearchExpression (non-empty string) and SearchAggregation (must be one of MAX/MIN/AVG/SUM and normalized to uppercase), but there are no specs asserting these error cases and normalization. Add RSpec coverage that (1) invalid/blank SearchExpression causes CfnGuardian::ValidationError, and (2) invalid SearchAggregation is rejected while valid lowercase values are normalized.
| | Property | Required | Default | Description | | ||
| |---|---|---|---| | ||
| | `SearchExpression` | Yes | - | A CloudWatch SEARCH() expression string. Supports `${Resource::...}` [variables](variables.md). | | ||
| | `SearchAggregation` | No | `MAX` | Aggregation function applied to the search results. Valid values: `MAX`, `MIN`, `AVG`, `SUM`. | | ||
|
|
There was a problem hiding this comment.
The markdown table under "Properties" is malformed (lines start with ||), which renders as an empty first column in most markdown parsers. Use standard table syntax with a single leading/trailing | per row.
| require 'fileutils' | ||
| require 'tmpdir' | ||
| require 'term/ansicolor' |
There was a problem hiding this comment.
fileutils and term/ansicolor are required but not used anywhere in this spec. Removing unused requires reduces load time and avoids pulling extra dependencies into the test process unnecessarily.
| require 'fileutils' | |
| require 'tmpdir' | |
| require 'term/ansicolor' | |
| require 'tmpdir' |
| def add_alarm(alarm) | ||
| actions = alarm.alarm_action.kind_of?(Array) ? alarm.alarm_action.map{|action| Ref(action)} : [Ref(alarm.alarm_action)] | ||
| actions.concat alarm.maintenance_groups.map {|mg| Ref(mg)} if alarm.maintenance_groups.any? | ||
| use_search = alarm.search_expression ? true : false |
There was a problem hiding this comment.
use_search is true for any non-nil value, including an empty string. Since '' is truthy in Ruby, this would emit a Metrics-based alarm with a blank Expression, producing an invalid CloudFormation template. Consider basing use_search on alarm.search_expression.is_a?(String) && !alarm.search_expression.strip.empty? (or raising early) to align with validation behavior.
| use_search = alarm.search_expression ? true : false | |
| use_search = alarm.search_expression.is_a?(String) && !alarm.search_expression.strip.empty? |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Expression: "#{aggregation}(search_expression)", | ||
| ReturnData: true | ||
| } | ||
| ] |
There was a problem hiding this comment.
When using the CloudWatch Alarm Metrics property (metric math), CloudFormation expects a ThresholdMetricId to indicate which MetricDataQuery is evaluated against Threshold/ComparisonOperator. This block builds queries search_expression and aggregate but never sets ThresholdMetricId, which can cause the alarm to fail creation or evaluate the wrong series. Set ThresholdMetricId to the query id that returns data (e.g. aggregate).
| ] | |
| ] | |
| ThresholdMetricId 'aggregate' |
No description provided.